Bias and Variance Approximation in Value Function Estimates
نویسندگان
چکیده
W consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.
منابع مشابه
Regularized Autoregressive Multiple Frequency Estimation
The paper addresses a problem of tracking multiple number of frequencies using Regularized Autoregressive (RAR) approximation. The RAR procedure allows to decrease approximation bias, comparing to other AR-based frequency detection methods, while still providing competitive variance of sample estimates. We show that the RAR estimates of multiple periodicities are consistent in probabilit...
متن کاملValue Function Approximation using Multiple Aggregation for Multiattribute Resource Management
We consider the problem of estimating the value of a multiattribute resource, where the attributes are categorical or discrete in nature and the number of potential attribute vectors is very large. The problem arises in approximate dynamic programming when we need to estimate the value of a multiattribute resource from estimates based on Monte-Carlo simulation. These problems have been traditio...
متن کاملAn Application of Non-response Bias Reduction Using Propensity Score Methods
In many statistical studies some units do not respond to a number or all of the questions. This situation causes a problem called non-response. Bias and variance inflation are two important consequences of non-response in surveys. Although increasing the sample size can prevented variance inflation, but cannot necessary adjust for the non-response bias. Therefore a number of methods ...
متن کاملRates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning
We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...
متن کاملBias-Induced Optical Absorption of Current Carrying Two-Orbital Quantum Dot with Strong Electron-Phonon Interaction (Polaron Regime)
The one photon absorption (OPA) cross section of a current carrying two-orbital quantum dot (QD) with strong electron-phonon interaction (polaron regime) is considered. Using the self-consistent non-equilibrium Hartree-Fock (HF) approximation, we determine the dependence of OPA cross section on the applied bias voltage, the strength of effective electron-electron interaction, and level spacing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Management Science
دوره 53 شماره
صفحات -
تاریخ انتشار 2007